ChromaDB
π Spring AI & Vector Databases: A Fun Ride into the World of AI-Powered Searchβ
π€ What is a Vector Database?β
Imagine you're Sherlock Holmes, but instead of a magnifying glass, you have a powerful AI that finds related stuff in a giant pile of data! π΅οΈ ββοΈ Thatβs what a vector database does! It stores vector embeddings (fancy term for long lists of numbers) and helps with similarity searches. But hold on, a vector store doesn't create the embeddingsβit just stores them. To create these magical vectors, you need an EmbeddingModel.
π§ Whatβs a Vector?β
A vector is basically a super-smart list of numbers that represents things like text, images, audio, or even your petβs mood (ok, maybe not that last one). Each number in a vector represents different attributes like sentiment, intensity, and context.
By calculating distances between these vectors, we can find similarities. This process is called semantic searchβbecause itβs not just searching words, itβs searching meaning!
π How Does a Vector Database Work?β
- Data Storage: Your raw data (text, images, videos, etc.) gets transformed into vectors using an AI model and stored.
- Data Retrieval: When you search, your query is converted into a vector, and the database finds the most similar vectors.
- Calculating Similarity: Various formulas are used to find the βclosenessβ of two vectors:
- Euclidean Distance: Straight-line distance between two vectors.
- Cosine Similarity: Measures the angle between vectors.
- Manhattan Distance: The sum of absolute differences between vector components.
- Jaccard Similarity: Measures overlap between two sets.
Popular vector databases include Pinecone, Elasticsearch, Chroma, Weaviate, and Quadrant. Some are open-source, some are startups, and all are pretty cool! π
π± Spring AI & The VectorStore Interfaceβ
In Spring AI, the VectorStore interface is your go-to tool for working with vector databases. It helps you store documents and perform similarity searches. Hereβs what it looks like:
public interface VectorStore {
void add(List<Document> documents);
Optional<Boolean> delete(List<String> idList);
List<Document> similaritySearch(String query);
List<Document> similaritySearch(SearchRequest request);
}
And the Document class:
public class Document implements Content {
private Map<String, Object> metadata;
private String content;
private List<Double> embedding = new ArrayList<>();
//...
}
π₯ Storing & Querying Documentsβ
Spring Boot makes life easy by autoconfiguring a VectorStore
bean when it detects a vector database starter module. Example:
@Autowired
VectorStore vectorStore;
ποΈ Storing Documentsβ
List<Document> documents = List.of(
new Document("...content..."),
new Document("...content..."),
new Document("...content...")
);
vectorStore.add(documents);
π Searching for Similar Documentsβ
List<Document> results = vectorStore.similaritySearch(
SearchRequest.query("...search-terms...").withTopK(5)
);
Spring AI supports multiple vector databases, and more will be added in the future. Check the official docs for the latest list. π
π οΈ SimpleVectorStore: A No-Fuss Demo Storeβ
For quick demos, use SimpleVectorStoreβa lightweight version of a vector store, similar to H2 for relational databases.
public class SimpleVectorStore implements VectorStore {
public void add(List<Document> documents) {...}
public List<Document> similaritySearch(SearchRequest request) {...}
public void save(File file) {...}
public void load(File file) {...}
public void load(Resource resource) {...}
//...
}
To use it:
@Bean
SimpleVectorStore vectorStore(EmbeddingModel embeddingModel) {
return new SimpleVectorStore(embeddingModel);
}
π₯ Vector Store Demo with ChromaDBβ
Let's see Spring AI in action by setting up a ChromaDB-powered vector store.
ποΈ Step 1: Setup ChromaDB with Docker Composeβ
version: '3.9'
networks:
net:
driver: bridge
services:
server:
image: ghcr.io/chroma-core/chroma:latest
environment:
- IS_PERSISTENT=TRUE
volumes:
- chroma-data:/chroma/chroma/
ports:
- 8000:8000
networks:
- net
volumes:
chroma-data:
driver: local
π¦ Step 2: Add Dependenciesβ
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-chroma-store-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-docker-compose</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
π Step 3: Load Data into Vector Storeβ
@Component
public class VectorStoreLoader implements ApplicationRunner {
@Autowired
VectorStore vectorStore;
@Override
public void run(ApplicationArguments args) throws Exception {
List<Document> documents = new ArrayList<>();
TikaDocumentReader reader = new TikaDocumentReader(new ClassPathResource("CallingRates.pdf"));
documents.addAll(reader.get());
vectorStore.add(documents);
System.out.println("Added documents to vector store");
}
}
π Step 4: Perform Similarity Searchβ
List<Document> documents = vectorStore.similaritySearch("investigation");
documents.stream().forEach(System.out::println);
Output:
Document{id='7cec17aa-...', metadata={source=story.md, distance=0.7674138}, content='...', media=[]}
Document{id='42726cdb-...', metadata={source=story.text, distance=0.8732333}, content='...', media=[]}
Document{id='9aad7daa-...', metadata={source=story.pdf, distance=0.8799484}, content='...', media=[]}
π Conclusionβ
We explored the magical world of vector databases and how Spring AI makes it easier to work with them. From storing and querying vectors to setting up ChromaDB with Docker Compose, weβve seen how you can integrate vector search into your applications.
π Go forth and build amazing AI-powered search applications!
Happy Coding! β¨π